12 research outputs found

    SMART-KG: Hybrid Shipping for SPARQL Querying on the Web

    Get PDF
    While Linked Data (LD) provides standards for publishing (RDF) and (SPARQL) querying Knowledge Graphs (KGs) on the Web, serving, accessing and processing such open, decentralized KGs is often practically impossible, as query timeouts on publicly available SPARQL endpoints show. Alternative solutions such as Triple Pattern Fragments (TPF) attempt to tackle the problem of availability by pushing query processing workload to the client side, but suffer from unnecessary transfer of irrelevant data on complex queries with large intermediate results. In this paper we present smart-KG, a novel approach to share the load between servers and clients, while significantly reducing data transfer volume, by combining TPF with shipping compressed KG partitions. Our evaluations show that smart-KG outperforms state-of-the-art client-side solutions and increases server-side availability towards more cost-effective and balanced hosting of open and decentralized KGs

    LETEO: Scalable anonymization of big data and its application to learning analytics

    Get PDF
    ANII Fondo sectorial de investigación con datos - 2018Created in 2007, Plan Ceibal is an inclusion and equal opportunities plan with the aim of supporting Uruguayan educational policies with technology. Throughout these years, and within the framework of its tasks, Ceibal has an important amount of data related to the use of technology in education, necessary to manage the plan and fulfill the assigned legal tasks. However, the data does not they can be studied without accounting for the problem of de identifying the users of the Plan. To exploit this data, Ceibal has deployed an instance of the Hortonworks Data Platform (HDP), a open source platform for the storage and parallel processing of massive data (big data). HDP offers a wide range of functional components ranging from large file storage (HDFS) to distributed programming of machine learning algorithms (Apache Spark / MLlib). However, as of today there are no solutions for the de-identification of personal code data open and integrated into the Hortonworks ecosystem. On the one hand, the deidentification tools existing data have not been designed so that they can easily scale to large volumes of data, and they also do not offer easy integration mechanisms with HDFS. This forces you to export the data outside of the platform that stores them to be able to anonymize them, with the consequent risk of exposure of confidential information. On the other hand, the few integrated solutions in the Hortonworks ecosystem are owners and the cost of their licenses is very significant. The objective of this project is to promote the use of the enormous amount of educational and technological data that Ceibal possesses, lifting one of the greatest obstacles that exist for that, namely, the preservation of privacy and the protection of the personal data of the beneficiaries of the Plan. To this end, this project seeks to generate anonymization tools that extend the HDP platform. On In particular, it seeks to develop open source modules to integrate into said platform, which implement a set of programmed anonymization techniques and algorithms in a distributed manner using Apache Spark and that can be applied to data sets stored in HDFS files

    Development of a Semantic Web Solution for Directory Services

    No full text
    The motivation for this work is based in a common problem in organizations. The problem is to access and to manage the growing amount of stored data in companies. Companies can take advantage with the utilization of the emerging Semantic Web technology in order to solve this problem. Invenio AS is in a situation where it is necessary to access a directory service in an efficient way and the Semantic Web languages can be used to solve it. In this thesis, a literature study has been done, an investigation about the main ontology languages proposed by World Wide Web Consortium, RDF(S) and OWL with its extension for Web services OWL-S and the ontology language proposed by the International Organization for Standardization, Topic Maps. This literature study can be used like an introduction to these Web ontology languages RDF, OWL (and OWL-S) and Topic Maps. A model of the databases has been extracted and designed in UML. The extracted model has been used to create a common ontology, merging both the initial databases. The ontology that represents the database in the three languages has been analysed. The quality and semantic accuracy of the languages for the Invenio case has been analysed and we have obtained detailed results from this analysis

    Development of a Semantic Web Solution for Directory Services

    Get PDF
    The motivation for this work is based in a common problem in organizations. The problem is to access and to manage the growing amount of stored data in companies. Companies can take advantage with the utilization of the emerging Semantic Web technology in order to solve this problem. Invenio AS is in a situation where it is necessary to access a directory service in an efficient way and the Semantic Web languages can be used to solve it. In this thesis, a literature study has been done, an investigation about the main ontology languages proposed by World Wide Web Consortium, RDF(S) and OWL with its extension for Web services OWL-S and the ontology language proposed by the International Organization for Standardization, Topic Maps. This literature study can be used like an introduction to these Web ontology languages RDF, OWL (and OWL-S) and Topic Maps. A model of the databases has been extracted and designed in UML. The extracted model has been used to create a common ontology, merging both the initial databases. The ontology that represents the database in the three languages has been analysed. The quality and semantic accuracy of the languages for the Invenio case has been analysed and we have obtained detailed results from this analysis

    PromoterLCNN: A Light CNN-Based Promoter Prediction and Classification Model

    No full text
    Promoter identification is a fundamental step in understanding bacterial gene regulation mechanisms. However, accurate and fast classification of bacterial promoters continues to be challenging. New methods based on deep convolutional networks have been applied to identify and classify bacterial promoters recognized by sigma (σ) factors and RNA polymerase subunits which increase affinity to specific DNA sequences to modulate transcription and respond to nutritional or environmental changes. This work presents a new multiclass promoter prediction model by using convolutional neural networks (CNNs), denoted as PromoterLCNN, which classifies Escherichia coli promoters into subclasses σ70, σ24, σ32, σ38, σ28, and σ54. We present a light, fast, and simple two-stage multiclass CNN architecture for promoter identification and classification. Training and testing were performed on a benchmark dataset, part of RegulonDB. Comparative performance of PromoterLCNN against other CNN-based classifiers using four parameters (Acc, Sn, Sp, MCC) resulted in similar or better performance than those that commonly use cascade architecture, reducing time by approximately 30–90% for training, prediction, and hyperparameter optimization without compromising classification quality

    SPARQLES: monitoring public SPARQL endpoints

    No full text
    We describe SPARQLES: an online system that monitors the health of public SPARQL endpoints on the Web by probing them with custom-designed queries at regular intervals. We present the architecture of SPARQLES and the variety of analytics that it runs over public SPARQL endpoints, categorised by availability, discoverability, performance and interoperability. We also detail the interfaces that the system provides for human and software agents to learn more about the recent history and current state of an individual SPARQL endpoint or about overall trends concerning the maturity of all endpoints monitored by the system. We likewise present some details of the performance of the system and the impact it has had thus far.Fujitsu Laboratories Limited CONICYT/FONDECYT Project 3130617 FONDECYT Project 11140900 DGIP Project 116.24.1 Millennium Nucleus Center for Semantic Web Research NC12000

    GenoVi, an open-source automated circular genome visualizer for bacteria and archaea.

    No full text
    The increase in microbial sequenced genomes from pure cultures and metagenomic samples reflects the current attainability of whole-genome and shotgun sequencing methods. However, software for genome visualization still lacks automation, integration of different analyses, and customizable options for non-experienced users. In this study, we introduce GenoVi, a Python command-line tool able to create custom circular genome representations for the analysis and visualization of microbial genomes and sequence elements. It is designed to work with complete or draft genomes, featuring customizable options including 25 different built-in color palettes (including 5 color-blind safe palettes), text formatting options, and automatic scaling for complete genomes or sequence elements with more than one replicon/sequence. Using a Genbank format file as the input file or multiple files within a directory, GenoVi (i) visualizes genomic features from the GenBank annotation file, (ii) integrates a Cluster of Orthologs Group (COG) categories analysis using DeepNOG, (iii) automatically scales the visualization of each replicon of complete genomes or multiple sequence elements, (iv) and generates COG histograms, COG frequency heatmaps and output tables including general stats of each replicon or contig processed. GenoVi's potential was assessed by analyzing single and multiple genomes of Bacteria and Archaea. Paraburkholderia genomes were analyzed to obtain a fast classification of replicons in large multipartite genomes. GenoVi works as an easy-to-use command-line tool and provides customizable options to automatically generate genomic maps for scientific publications, educational resources, and outreach activities. GenoVi is freely available and can be downloaded from https://github.com/robotoD/GenoVi

    D1.1.3: NeOn Formalisms for Modularization: Syntax, Semantics, Algebra

    No full text
    The goal of this document is to come up with a formalism for ontology modularization, including syntaxes andthe fundamental properties of a semantics of such a formalism. Furthermore we introduce operators to create,combine and manipulate ontology modules and give formal definitions for these operators based on the semanticsof ontology modules. The definition of the NeOn formalism for modularization and of the operators to manipulateontology modules are guided by a number of use cases and examples, from NeOn cases studies and other workpackages
    corecore